Analysis Of Boolean Functions
   HOME

TheInfoList



OR:

In
mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
and
theoretical computer science computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumscribe the ...
, analysis of Boolean functions is the study of real-valued functions on \^n or \^n (such functions are sometimes known as
pseudo-Boolean function In mathematics and optimization, a pseudo-Boolean function is a function of the form :f: \mathbf^n \to \R, where is a ''Boolean domain'' and is a nonnegative integer called the arity of the function. A Boolean function is then a special case, ...
s) from a spectral perspective. The functions studied are often, but not always, Boolean-valued, making them
Boolean function In mathematics, a Boolean function is a function whose arguments and result assume values from a two-element set (usually , or ). Alternative names are switching function, used especially in older computer science literature, and truth function ...
s. The area has found many applications in
combinatorics Combinatorics is an area of mathematics primarily concerned with counting, both as a means and an end in obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many a ...
,
social choice theory Social choice theory or social choice is a theoretical framework for analysis of combining individual opinions, preferences, interests, or welfares to reach a ''collective decision'' or ''social welfare'' in some sense.Amartya Sen (2008). "Soci ...
,
random graph In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs ...
s, and theoretical computer science, especially in
hardness of approximation In computer science, hardness of approximation is a field that studies the algorithmic complexity of finding near-optimal solutions to optimization problems. Scope Hardness of approximation complements the study of approximation algorithms by pr ...
,
property testing In computer science, a property testing algorithm for a decision problem is an algorithm whose query complexity to its input is much smaller than the instance size of the problem. Typically property testing algorithms are used to distinguish if s ...
, and PAC learning.


Basic concepts

We will mostly consider functions defined on the domain \^n. Sometimes it is more convenient to work with the domain \^n instead. If f is defined on \^n, then the corresponding function defined on \^n is :f_(x_1,\ldots,x_n) = f((-1)^,\ldots,(-1)^). Similarly, for us a Boolean function is a \-valued function, though often it is more convenient to consider \-valued functions instead.


Fourier expansion

Every real-valued function f\colon \^n \to \mathbb has a unique expansion as a multilinear polynomial: : f(x) = \sum_ \hat(S) \chi_S(x), \quad \chi_S(x) = \prod_ x_i. (Note that even if the function is 0-1 valued this is not a sum mod 2, but just an ordinary sum of real numbers.) This is the
Hadamard transform The Hadamard transform (also known as the Walsh–Hadamard transform, Hadamard–Rademacher–Walsh transform, Walsh transform, or Walsh–Fourier transform) is an example of a generalized class of Fourier transforms. It performs an orthogonal ...
of the function f, which is the
Fourier transform A Fourier transform (FT) is a mathematical transform that decomposes functions into frequency components, which are represented by the output of the transform as a function of frequency. Most commonly functions of time or space are transformed ...
in the group \mathbb_2^n. The coefficients \hat(S) are known as ''Fourier coefficients'', and the entire sum is known as the ''Fourier expansion'' of f. The functions \chi_S are known as ''Fourier characters'', and they form an orthonormal basis for the space of all functions over \^n, with respect to the inner product \langle f,g \rangle = 2^ \sum_ f(x) g(x). The Fourier coefficients can be calculated using an inner product: : \hat(S) = \langle f, \chi_S \rangle. In particular, this shows that \hat(\emptyset) = \operatorname /math>, where the
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
is taken with respect to the
uniform distribution Uniform distribution may refer to: * Continuous uniform distribution * Discrete uniform distribution * Uniform distribution (ecology) * Equidistributed sequence See also * * Homogeneous distribution In mathematics, a homogeneous distribution ...
over \^n. Parseval's identity states that : \, f\, ^2 = \operatorname ^2= \sum_S \hat(S)^2. If we skip S = \emptyset, then we get the variance of f: : \operatorname = \sum_ \hat(S)^2.


Fourier degree and Fourier levels

The ''degree'' of a function f\colon \^n \to \mathbb is the maximum d such that \hat(S) \neq 0 for some set S of size d. In other words, the degree of f is its degree as a multilinear polynomial. It is convenient to decompose the Fourier expansion into ''levels'': the Fourier coefficient \hat(S) is on level , S, . The ''degree d'' part of f is : f^ = \sum_ \hat(S) \chi_S. It is obtained from f by zeroing out all Fourier coefficients not on level d. We similarly define f^,f^,f^,f^.


Influence

The i'th influence of a function f\colon \^n \to \mathbb can be defined in two equivalent ways: : \begin & \operatorname_i = \operatorname\left \left(\frac \right)^2 \right= \sum_ \hat(S)^2, \\ pt& f^(x_1,\ldots,x_n) = f(x_1,\ldots,x_,-x_i,x_,\ldots,x_n). \end If f is Boolean then \operatorname_i /math> is the probability that flipping the i'th coordinate flips the value of the function: :\operatorname_i = \Pr (x) \neq f^(x) If \operatorname_i = 0 then f doesn't depend on the i'th coordinate. The ''total influence'' of f is the sum of all of its influences: :\operatorname = \sum_^n \operatorname_i = \sum_S , S, \hat(S)^2. The total influence of a Boolean function is also the ''average sensitivity'' of the function. The ''sensitivity'' of a Boolean function f at a given point is the number of coordinates i such that if we flip the i'th coordinate, the value of the function changes. The average value of this quantity is exactly the total influence. The total influence can also be defined using the discrete Laplacian of the
Hamming graph Hamming graphs are a special class of graphs named after Richard Hamming and used in several branches of mathematics (graph theory) and computer science. Let be a set of elements and a positive integer. The Hamming graph has vertex se ...
, suitably normalized: \operatorname = \langle f,Lf \rangle. A generalized form of influence is the \rho-stable influence, defined by: :\operatorname_i^ = \operatorname_\rho operatorname_i f= \sum_\rho^\hat f(S)^2. The corresponding total influences is :\operatorname^ = \frac\operatorname_\rho = \sum_, S, \rho^\hat f(S)^2. One can prove that a function f:\^n\to\ has at most “constantly” many “stably-influential” coordinates: , \, \le \frac.


Noise stability

Given -1 \leq \rho \leq 1, we say that two random vectors x,y \in \^n are ''\rho-correlated'' if the marginal distributions of x,y are uniform, and \operatorname _iy_i= \rho. Concretely, we can generate a pair of \rho-correlated random variables by first choosing x,z \in \^n uniformly at random, and then choosing y according to one of the following two equivalent rules, applied independently to each coordinate: : y_i = \begin x_i & \text \rho, \\ z_i & \text 1-\rho. \end \quad \text \quad y_i = \begin x_i & \text \frac, \\ -x_i & \text \frac. \end We denote this distribution by y \sim N_\rho(x) . The ''noise stability'' of a function f\colon \^n \to \mathbb at \rho can be defined in two equivalent ways: : \operatorname_\rho = \operatorname_
(x) f(y) An emoticon (, , rarely , ), short for "emotion icon", also known simply as an emote, is a pictorial representation of a facial expression using characters—usually punctuation marks, numbers, and letters—to express a person's feelings, m ...
= \sum_ \rho^ \hat(S)^2. For 0 \leq \delta \leq 1, the ''noise sensitivity'' of f at \delta is : \operatorname_\delta = \frac - \frac \operatorname_ If f is Boolean, then this is the probability that the value of f changes if we flip each coordinate with probability \delta, independently.


Noise operator

The ''noise operator'' T_\rho is an operator taking a function f\colon \^n \to \mathbb and returning another function T_\rho f\colon \^n \to \mathbb given by : (T_\rho f)(x) = \operatorname_
(y) A thumb signal, usually described as a thumbs-up or thumbs-down, is a common hand gesture achieved by a closed fist held with the thumb extended upward or downward in approval or disapproval, respectively. These gestures have become metaphors ...
= \sum_ \rho^ \hat(S) \chi_S. When \rho > 0, the noise operator can also be defined using a
continuous-time Markov chain A continuous-time Markov chain (CTMC) is a continuous stochastic process in which, for each state, the process will change state according to an exponential random variable and then move to a different state as specified by the probabilities of ...
in which each bit is flipped independently with rate 1. The operator T_\rho corresponds to running this Markov chain for \frac\log\frac steps starting at x, and taking the average value of f at the final state. This Markov chain is generated by the Laplacian of the Hamming graph, and this relates total influence to the noise operator. Noise stability can be defined in terms of the noise operator: \operatorname_\rho = \langle f, T_\rho f \rangle .


Hypercontractivity

For 1 \leq q < \infty, the L_q-norm of a function f\colon \^n \to \mathbb is defined by : \, f\, _q = \sqrt We also define \, f\, _\infty = \max_ , f(x), . The hypercontractivity theorem states that for any q > 2 and q' = 1/(1-1/q), : \, T_\rho f\, _q \leq \, f\, _2 \quad \text \quad \, T_\rho f\, _2 \leq \, f\, _. Hypercontractivity is closely related to the logarithmic Sobolev inequalities of
functional analysis Functional analysis is a branch of mathematical analysis, the core of which is formed by the study of vector spaces endowed with some kind of limit-related structure (e.g. inner product, norm, topology, etc.) and the linear functions defi ...
. A similar result for q < 2 is known as ''reverse hypercontractivity''.


''p''-Biased analysis

In many situations the input to the function is not uniformly distributed over \^n, but instead has a bias toward -1 or 1. In these situations it is customary to consider functions over the domain \^n. For 0 < p < 1, the ''p''-biased measure \mu_p is given by : \mu_p(x) = p^ (1-p)^. This measure can be generated by choosing each coordinate independently to be 1 with probability p and 0 with probability 1-p. The classical Fourier characters are no longer orthogonal with respect to this measure. Instead, we use the following characters: : \omega_S(x) = \left(\sqrt\right)^ \left(-\sqrt\right)^. The ''p''-biased Fourier expansion of f is the expansion of f as a linear combination of ''p''-biased characters: : f = \sum_ \hat(S) \omega_S. We can extend the definitions of influence and the noise operator to the ''p''-biased setting by using their spectral definitions.


Influence

The i's influence is given by : \operatorname_i = \sum_ \hat(S)^2 = p(1-p) \operatorname f-f^)^2 The total influence is the sum of the individual influences: :\operatorname = \sum_^n \operatorname_i = \sum_ , S, \hat(S)^2 .


Noise operator

A pair of \rho-correlated random variables can be obtained by choosing x,z \sim \mu_p independently and y \sim N_\rho(x), where N_\rho is given by : y_i = \begin x_i & \text \rho, \\ z_i & \text 1-\rho. \end The noise operator is then given by : (T_\rho f)(x) = \sum_ \rho^ \hat(S) \omega_S(x) = \operatorname_
(y) A thumb signal, usually described as a thumbs-up or thumbs-down, is a common hand gesture achieved by a closed fist held with the thumb extended upward or downward in approval or disapproval, respectively. These gestures have become metaphors ...
Using this we can define the noise stability and the noise sensitivity, as before.


Russo–Margulis formula

The Russo–Margulis formula (also called the Margulis–Russo formula) states that for monotone Boolean functions f\colon \^n \to \, : \frac \operatorname_ (x)= \frac = \sum_^n \Pr \neq f^ Both the influence and the probabilities are taken with respect to \mu_p, and on the right-hand side we have the average sensitivity of f. If we think of f as a property, then the formula states that as p varies, the derivative of the probability that f occurs at p equals the average sensitivity at p. The Russo–Margulis formula is key for proving sharp threshold theorems such as Friedgut's.


Gaussian space

One of the deepest results in the area, the
invariance principle In cognitive linguistics, the invariance principle is a simple attempt to explain similarities and differences between how an idea is understood in "ordinary" usage, and how it is understood when used as a conceptual metaphor. Kövecses (2002: 102) ...
, connects the distribution of functions on the Boolean cube \^n to their distribution on ''Gaussian space'', which is the space \mathbb^n endowed with the standard n-dimensional
Gaussian measure In mathematics, Gaussian measure is a Borel measure on finite-dimensional Euclidean space R''n'', closely related to the normal distribution in statistics. There is also a generalization to infinite-dimensional spaces. Gaussian measures are nam ...
. Many of the basic concepts of Fourier analysis on the Boolean cube have counterparts in Gaussian space: * The counterpart of the Fourier expansion in Gaussian space is the Hermite expansion, which is an expansion to an infinite sum (converging in L^2) of multivariate
Hermite polynomials In mathematics, the Hermite polynomials are a classical orthogonal polynomial sequence. The polynomials arise in: * signal processing as Hermitian wavelets for wavelet transform analysis * probability, such as the Edgeworth series, as well ...
. * The counterpart of total influence or average sensitivity for the indicator function of a set is Gaussian surface area, which is the Minkowski content of the boundary of the set. * The counterpart of the noise operator is the Ornstein–Uhlenbeck operator (related to the Mehler transform), given by (U_\rho f)(x) = \operatorname_ (\rho x + \sqrtz)/math>, or alternatively by (U_\rho f)(x) = \operatorname
(y) A thumb signal, usually described as a thumbs-up or thumbs-down, is a common hand gesture achieved by a closed fist held with the thumb extended upward or downward in approval or disapproval, respectively. These gestures have become metaphors ...
/math>, where x,y is a pair of \rho-correlated standard Gaussians. * Hypercontractivity holds (with appropriate parameters) in Gaussian space as well. Gaussian space is more symmetric than the Boolean cube (for example, it is rotation invariant), and supports continuous arguments which may be harder to get through in the discrete setting of the Boolean cube. The invariance principle links the two settings, and allows deducing results on the Boolean cube from results on Gaussian space.


Basic results


Friedgut–Kalai–Naor theorem

If f\colon \^n \to \ has degree at most 1, then f is either constant, equal to a coordinate, or equal to the negation of a coordinate. In particular, f is a ''dictatorship'': a function depending on at most one coordinate. The Friedgut–Kalai–Naor theorem, also known as the ''FKN theorem'', states that if f ''almost'' has degree 1 then it is ''close'' to a dictatorship. Quantitatively, if f\colon \^n \to \ and \, f^\, ^2 < \varepsilon, then f is O(\varepsilon)-close to a dictatorship, that is, \, f - g\, ^2 = O(\varepsilon) for some Boolean dictatorship g, or equivalently, \Pr \neq g= O(\varepsilon) for some Boolean dictatorship g. Similarly, a Boolean function of degree at most d depends on at most C_2^ coordinates, making it a ''junta'' (a function depending on a constant number of coordinates), where C_ is an absolute constant equal to at least 1.5, and at most 4.41, as shown by Wellens. The Kindler–Safra theorem generalizes the Friedgut–Kalai–Naor theorem to this setting. It states that if f\colon \^n \to \ satisfies \, f^\, ^2 < \varepsilon then f is O(\varepsilon)-close to a Boolean function of degree at most d.


Kahn–Kalai–Linial theorem

The Poincaré inequality for the Boolean cube (which follows from formulas appearing above) states that for a function f\colon \^n \to \mathbb, :\operatorname \leq \operatorname \leq \deg f \cdot \operatorname This implies that \max_i \operatorname_i \geq \frac. The Kahn–Kalai–Linial theorem, also known as the ''KKL theorem'', states that if f is Boolean then \max_i \operatorname_i = \Omega\left(\frac\right). The bound given by the Kahn–Kalai–Linial theorem is tight, and is achieved by the ''Tribes'' function of Ben-Or and Linial: : (x_ \land \cdots \land x_) \lor \cdots \lor (x_ \land \cdots \land x_). The Kahn–Kalai–Linial theorem was one of the first results in the area, and was the one introducing hypercontractivity into the context of Boolean functions.


Friedgut's junta theorem

If f\colon \^n \to \ is an M-junta (a function depending on at most M coordinates) then \operatorname \leq M according to the Poincaré inequality. Friedgut's theorem is a converse to this result. It states that for any \varepsilon > 0, the function f is \varepsilon-close to a Boolean junta depending on \exp (\operatorname \varepsilon) coordinates. Combined with the Russo–Margulis lemma, Friedgut's junta theorem implies that for every p, every monotone function is close to a junta with respect to \mu_q for some q \approx p.


Invariance principle

The invariance principle generalizes the
Berry–Esseen theorem In probability theory, the central limit theorem states that, under certain circumstances, the probability distribution of the scaled mean of a random sample converges to a normal distribution as the sample size increases to infinity. Under strong ...
to non-linear functions. The Berry–Esseen theorem states (among else) that if f = \sum_^n c_i x_i and no c_i is too large compared to the rest, then the distribution of f over \^n is close to a normal distribution with the same mean and variance. The invariance principle (in a special case) informally states that if f is a multilinear polynomial of bounded degree over x_1,\ldots,x_n and all influences of f are small, then the distribution of f under the uniform measure over \^n is close to its distribution in Gaussian space. More formally, let \psi be a univariate
Lipschitz function In mathematical analysis, Lipschitz continuity, named after German mathematician Rudolf Lipschitz, is a strong form of uniform continuity for functions. Intuitively, a Lipschitz continuous function is limited in how fast it can change: there e ...
, let f = \sum_ \hat(S) \chi_S, let k=\deg f, and let \varepsilon = \max_i \sum_ \hat(S)^2. Suppose that \sum_ \hat(S)^2 \leq 1. Then : \left, \operatorname_ psi(f(x))- \operatorname_ psi(f(g))\ = O(k9^k \varepsilon). By choosing appropriate \psi, this implies that the distributions of f under both measures are close in CDF distance, which is given by \sup_t , \Pr (x)- \Pr (g). The invariance principle was the key ingredient in the original proof of the ''Majority is Stablest'' theorem.


Some applications


Linearity testing

A Boolean function f\colon \^n \to \ is ''linear'' if it satisfies f(xy) = f(x)f(y), where xy = (x_1y_1,\ldots,x_ny_n). It is not hard to show that the Boolean linear functions are exactly the characters \chi_S. In
property testing In computer science, a property testing algorithm for a decision problem is an algorithm whose query complexity to its input is much smaller than the instance size of the problem. Typically property testing algorithms are used to distinguish if s ...
we want to test whether a given function is linear. It is natural to try the following test: choose x,y \in \^n uniformly at random, and check that f(xy) = f(x)f(y). If f is linear then it always passes the test. Blum, Luby and Rubinfeld showed that if the test passes with probability 1-\varepsilon then f is O(\varepsilon)-close to a Fourier character. Their proof was combinatorial. Bellare et al. gave an extremely simple Fourier-analytic proof, that also shows that if the test succeeds with probability 1/2 + \varepsilon, then f is correlated with a Fourier character. Their proof relies on the following formula for the success probability of the test: : \frac + \frac \sum_ \hat(S)^3.


Arrow's theorem

Arrow's impossibility theorem Arrow's impossibility theorem, the general possibility theorem or Arrow's paradox is an impossibility theorem in social choice theory that states that when voters have three or more distinct alternatives (options), no ranked voting electoral syst ...
states that for three and more candidates, the only unanimous voting rule for which there is always a
Condorcet winner An electoral system satisfies the Condorcet winner criterion () if it always chooses the Condorcet winner when one exists. The candidate who wins a majority of the vote in every head-to-head election against each of the other candidatesthat is, a ...
is a dictatorship. The usual proof of Arrow's theorem is combinatorial. Kalai gave an alternative proof of this result in the case of three candidates using Fourier analysis. If f\colon \^n \to \ is the rule that assigns a winner among two candidates given their relative orders in the votes, then the probability that there is a Condorcet winner given a uniformly random vote is \frac - \frac \operatorname_ /math>, from which the theorem easily follows. The FKN theorem implies that if f is a rule for which there is almost always a Condorcet winner, then f is close to a dictatorship.


Sharp thresholds

A classical result in the theory of
random graph In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs ...
s states that the probability that a G(n,p) random graph is connected tends to e^ if p \sim \frac. This is an example of a ''sharp threshold'': the width of the "threshold window", which is O(1/n), is asymptotically smaller than the threshold itself, which is roughly \frac. In contrast, the probability that a G(n,p) graph contains a triangle tends to e^ when p \sim \frac. Here both the threshold window and the threshold itself are \Theta(1/n), and so this is a ''coarse threshold''. Friedgut's sharp threshold theorem states, roughly speaking, that a monotone graph property (a graph property is a property which doesn't depend on the names of the vertices) has a sharp threshold unless it is correlated with the appearance of small subgraphs. This theorem has been widely applied to analyze random graphs and
percolation Percolation (from Latin ''percolare'', "to filter" or "trickle through"), in physics, chemistry and materials science, refers to the movement and filtering of fluids through porous materials. It is described by Darcy's law. Broader applicatio ...
. On a related note, the KKL theorem implies that the width of threshold window is always at most O(1/\log n).


Majority is stablest

Let \operatorname_n\colon \^n \to \ denote the majority function on n coordinates. Sheppard's formula gives the asymptotic noise stability of majority: : \operatorname_\rho operatorname_n\longrightarrow 1 - \frac \arccos \rho. This is related to the probability that if we choose x \in \^n uniformly at random and form y \in \^n by flipping each bit of x with probability \frac, then the majority stays the same: : \operatorname_\rho operatorname_n= 2\Pr operatorname_n(x) = \operatorname_n(y)1. There are Boolean functions with larger noise stability. For example, a dictatorship x_i has noise stability \rho. The Majority is Stablest theorem states, informally, then the only functions having noise stability larger than majority have influential coordinates. Formally, for every \varepsilon > 0 there exists \tau > 0 such that if f\colon \^n \to \ has expectation zero and \max_i \operatorname_i \leq \tau, then \operatorname_\rho \leq 1 - \frac \arccos \rho + \varepsilon. The first proof of this theorem used the
invariance principle In cognitive linguistics, the invariance principle is a simple attempt to explain similarities and differences between how an idea is understood in "ordinary" usage, and how it is understood when used as a conceptual metaphor. Kövecses (2002: 102) ...
in conjunction with an isoperimetric theorem of Borell in Gaussian space; since then more direct proofs were devised. Majority is Stablest implies that the Goemans–Williamson approximation algorithm for MAX-CUT is optimal, assuming the
unique games conjecture In computational complexity theory, the unique games conjecture (often referred to as UGC) is a conjecture made by Subhash Khot in 2002. The conjecture postulates that the problem of determining the approximate ''value'' of a certain type of ga ...
. This implication, due to Khot et al., was the impetus behind proving the theorem.


References

{{reflist Boolean algebra Mathematical optimization Theoretical computer science